204 research outputs found

    Translation-Invariant Representation for Cumulative Foot Pressure Images

    Full text link
    Human can be distinguished by different limb movements and unique ground reaction force. Cumulative foot pressure image is a 2-D cumulative ground reaction force during one gait cycle. Although it contains pressure spatial distribution information and pressure temporal distribution information, it suffers from several problems including different shoes and noise, when putting it into practice as a new biometric for pedestrian identification. In this paper, we propose a hierarchical translation-invariant representation for cumulative foot pressure images, inspired by the success of Convolutional deep belief network for digital classification. Key contribution in our approach is discriminative hierarchical sparse coding scheme which helps to learn useful discriminative high-level visual features. Based on the feature representation of cumulative foot pressure images, we develop a pedestrian recognition system which is invariant to three different shoes and slight local shape change. Experiments are conducted on a proposed open dataset that contains more than 2800 cumulative foot pressure images from 118 subjects. Evaluations suggest the effectiveness of the proposed method and the potential of cumulative foot pressure images as a biometric.Comment: 6 page

    A Light CNN for Deep Face Representation with Noisy Labels

    Full text link
    The volume of convolutional neural network (CNN) models proposed for face recognition has been continuously growing larger to better fit large amount of training data. When training data are obtained from internet, the labels are likely to be ambiguous and inaccurate. This paper presents a Light CNN framework to learn a compact embedding on the large-scale face data with massive noisy labels. First, we introduce a variation of maxout activation, called Max-Feature-Map (MFM), into each convolutional layer of CNN. Different from maxout activation that uses many feature maps to linearly approximate an arbitrary convex activation function, MFM does so via a competitive relationship. MFM can not only separate noisy and informative signals but also play the role of feature selection between two feature maps. Second, three networks are carefully designed to obtain better performance meanwhile reducing the number of parameters and computational costs. Lastly, a semantic bootstrapping method is proposed to make the prediction of the networks more consistent with noisy labels. Experimental results show that the proposed framework can utilize large-scale noisy data to learn a Light model that is efficient in computational costs and storage spaces. The learned single network with a 256-D representation achieves state-of-the-art results on various face benchmarks without fine-tuning. The code is released on https://github.com/AlfredXiangWu/LightCNN.Comment: arXiv admin note: text overlap with arXiv:1507.04844. The models are released on https://github.com/AlfredXiangWu/LightCNN, IEEE Transactions on Information Forensics and Security, 201

    Deep Semantic Ranking Based Hashing for Multi-Label Image Retrieval

    Full text link
    With the rapid growth of web images, hashing has received increasing interests in large scale image retrieval. Research efforts have been devoted to learning compact binary codes that preserve semantic similarity based on labels. However, most of these hashing methods are designed to handle simple binary similarity. The complex multilevel semantic structure of images associated with multiple labels have not yet been well explored. Here we propose a deep semantic ranking based method for learning hash functions that preserve multilevel semantic similarity between multi-label images. In our approach, deep convolutional neural network is incorporated into hash functions to jointly learn feature representations and mappings from them to hash codes, which avoids the limitation of semantic representation power of hand-crafted features. Meanwhile, a ranking list that encodes the multilevel similarity information is employed to guide the learning of such deep hash functions. An effective scheme based on surrogate loss is used to solve the intractable optimization problem of nonsmooth and multivariate ranking measures involved in the learning procedure. Experimental results show the superiority of our proposed approach over several state-of-the-art hashing methods in term of ranking evaluation metrics when tested on multi-label image datasets.Comment: CVPR 201

    Coupled Deep Learning for Heterogeneous Face Recognition

    Full text link
    Heterogeneous face matching is a challenge issue in face recognition due to large domain difference as well as insufficient pairwise images in different modalities during training. This paper proposes a coupled deep learning (CDL) approach for the heterogeneous face matching. CDL seeks a shared feature space in which the heterogeneous face matching problem can be approximately treated as a homogeneous face matching problem. The objective function of CDL mainly includes two parts. The first part contains a trace norm and a block-diagonal prior as relevance constraints, which not only make unpaired images from multiple modalities be clustered and correlated, but also regularize the parameters to alleviate overfitting. An approximate variational formulation is introduced to deal with the difficulties of optimizing low-rank constraint directly. The second part contains a cross modal ranking among triplet domain specific images to maximize the margin for different identities and increase data for a small amount of training samples. Besides, an alternating minimization method is employed to iteratively update the parameters of CDL. Experimental results show that CDL achieves better performance on the challenging CASIA NIR-VIS 2.0 face recognition database, the IIIT-D Sketch database, the CUHK Face Sketch (CUFS), and the CUHK Face Sketch FERET (CUFSF), which significantly outperforms state-of-the-art heterogeneous face recognition methods.Comment: AAAI 201

    Deep Steganalysis: End-to-End Learning with Supervisory Information beyond Class Labels

    Full text link
    Recently, deep learning has shown its power in steganalysis. However, the proposed deep models have been often learned from pre-calculated noise residuals with fixed high-pass filters rather than from raw images. In this paper, we propose a new end-to-end learning framework that can learn steganalytic features directly from pixels. In the meantime, the high-pass filters are also automatically learned. Besides class labels, we make use of additional pixel level supervision of cover-stego image pair to jointly and iteratively train the proposed network which consists of a residual calculation network and a steganalysis network. The experimental results prove the effectiveness of the proposed architecture

    Deep Supervised Discrete Hashing

    Full text link
    With the rapid growth of image and video data on the web, hashing has been extensively studied for image or video search in recent years. Benefit from recent advances in deep learning, deep hashing methods have achieved promising results for image retrieval. However, there are some limitations of previous deep hashing methods (e.g., the semantic information is not fully exploited). In this paper, we develop a deep supervised discrete hashing algorithm based on the assumption that the learned binary codes should be ideal for classification. Both the pairwise label information and the classification information are used to learn the hash codes within one stream framework. We constrain the outputs of the last layer to be binary codes directly, which is rarely investigated in deep hashing algorithm. Because of the discrete nature of hash codes, an alternating minimization method is used to optimize the objective function. Experimental results have shown that our method outperforms current state-of-the-art methods on benchmark datasets.Comment: Accepted by NIPS 201

    Wasserstein CNN: Learning Invariant Features for NIR-VIS Face Recognition

    Full text link
    Heterogeneous face recognition (HFR) aims to match facial images acquired from different sensing modalities with mission-critical applications in forensics, security and commercial sectors. However, HFR is a much more challenging problem than traditional face recognition because of large intra-class variations of heterogeneous face images and limited training samples of cross-modality face image pairs. This paper proposes a novel approach namely Wasserstein CNN (convolutional neural networks, or WCNN for short) to learn invariant features between near-infrared and visual face images (i.e. NIR-VIS face recognition). The low-level layers of WCNN are trained with widely available face images in visual spectrum. The high-level layer is divided into three parts, i.e., NIR layer, VIS layer and NIR-VIS shared layer. The first two layers aims to learn modality-specific features and NIR-VIS shared layer is designed to learn modality-invariant feature subspace. Wasserstein distance is introduced into NIR-VIS shared layer to measure the dissimilarity between heterogeneous feature distributions. So W-CNN learning aims to achieve the minimization of Wasserstein distance between NIR distribution and VIS distribution for invariant deep feature representation of heterogeneous face images. To avoid the over-fitting problem on small-scale heterogeneous face data, a correlation prior is introduced on the fully-connected layers of WCNN network to reduce parameter space. This prior is implemented by a low-rank constraint in an end-to-end network. The joint formulation leads to an alternating minimization for deep feature representation at training stage and an efficient computation for heterogeneous data at testing stage. Extensive experiments on three challenging NIR-VIS face recognition databases demonstrate the significant superiority of Wasserstein CNN over state-of-the-art methods

    Accelerating Deep Neural Networks with Spatial Bottleneck Modules

    Full text link
    This paper presents an efficient module named spatial bottleneck for accelerating the convolutional layers in deep neural networks. The core idea is to decompose convolution into two stages, which first reduce the spatial resolution of the feature map, and then restore it to the desired size. This operation decreases the sampling density in the spatial domain, which is independent yet complementary to network acceleration approaches in the channel domain. Using different sampling rates, we can tradeoff between recognition accuracy and model complexity. As a basic building block, spatial bottleneck can be used to replace any single convolutional layer, or the combination of two convolutional layers. We empirically verify the effectiveness of spatial bottleneck by applying it to the deep residual networks. Spatial bottleneck achieves 2x and 1.4x speedup on the regular and channel-bottlenecked residual blocks, respectively, with the accuracies retained in recognizing low-resolution images, and even improved in recognizing high-resolution images.Comment: 9 pages, 5 figure

    Learning Structured Ordinal Measures for Video based Face Recognition

    Full text link
    This paper presents a structured ordinal measure method for video-based face recognition that simultaneously learns ordinal filters and structured ordinal features. The problem is posed as a non-convex integer program problem that includes two parts. The first part learns stable ordinal filters to project video data into a large-margin ordinal space. The second seeks self-correcting and discrete codes by balancing the projected data and a rank-one ordinal matrix in a structured low-rank way. Unsupervised and supervised structures are considered for the ordinal matrix. In addition, as a complement to hierarchical structures, deep feature representations are integrated into our method to enhance coding stability. An alternating minimization method is employed to handle the discrete and low-rank constraints, yielding high-quality codes that capture prior structures well. Experimental results on three commonly used face video databases show that our method with a simple voting classifier can achieve state-of-the-art recognition rates using fewer features and samples

    Supervised Discrete Hashing with Relaxation

    Full text link
    Data-dependent hashing has recently attracted attention due to being able to support efficient retrieval and storage of high-dimensional data such as documents, images, and videos. In this paper, we propose a novel learning-based hashing method called "Supervised Discrete Hashing with Relaxation" (SDHR) based on "Supervised Discrete Hashing" (SDH). SDH uses ordinary least squares regression and traditional zero-one matrix encoding of class label information as the regression target (code words), thus fixing the regression target. In SDHR, the regression target is instead optimized. The optimized regression target matrix satisfies a large margin constraint for correct classification of each example. Compared with SDH, which uses the traditional zero-one matrix, SDHR utilizes the learned regression target matrix and, therefore, more accurately measures the classification error of the regression model and is more flexible. As expected, SDHR generally outperforms SDH. Experimental results on two large-scale image datasets (CIFAR-10 and MNIST) and a large-scale and challenging face dataset (FRGC) demonstrate the effectiveness and efficiency of SDHR
    • …
    corecore